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Abstract This paper presents a new condition for the existence of optimal stationary policies 
in average-cost continuous-time Markov decision processes with unbounded cost and transition 
rates, arising from controlled queueing systems. This condition is closely related to the stability of 
queueing systems. It suggests that the proof of the stability can be exploited to verify the existence 
of an optimal stationary policy. This new condition is easier to verify than existing conditions. 
Moreover, several conditions are provided which suffice for the average-cost optimality equality to 
hold. 
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1 Introduction 

Queueing systems have wide applications in computer communication networks, manufacturing 
processes and customer service platforms |2]. There exists a lot of literature studying issues such 
as system stability, cost and performance analysis under a given service principle, e.g., [HHS]. In 
order to cut down the operational cost and better serve customers, the queueing models should 
be controlled in a way such that the operational cost is minimized, e.g., [7j. A lot of controlled 
queueing models can be analyzed as continuous-time Markov decision processes (CTMDP) |I2j . 
The buffer of the queue model is often unlimited, and transition rates might be dependent on 


I 


the system state. Therefore, the corresponding CTMDP often has denumerable states and the 
transition rates are unbounded. Moreover, the state-dependent cost rates are also unbounded. 
A question naturally arises that whether an optimal stationary policy for such a CTMDP exists 
or not. For the discounted cost CTMDP, it is often relatively easy to verify whether an optimal 
stationary policy exists However, for the average-cost CTMDP, more conditions should be 

imposed to ensure the existence of an optimal stationary policy This paper provides a new 

condition under which an average-cost optimal stationary policy exists, which is different from 
existing conditions. 

In 2002, [6] presents a set of conditions under which an average-cost optimal stationary policy 
exists. Their conditions requires constructing a series of functions which satisfy their proposed 
assumptions. It is not straightforward to construct these functions for most problems we encounter. 
Later in 2009, [1] gives a sufficient condition for the existence of optimal stationary policies, which 
also requires finding a function satisfying several conditions. However, this function is often problem 
specific. Without adequate research of the specific CTMDP, it is not easy to find an appropriate 
function. It will be valuable if we can find a way to bypass seeking for such a function. Focused 
on discrete-time Markov decision processes (DTMDP), [in] gives several conditions under which 
an average cost optimal stationary policy exists. |10j has also mentioned that CTMDP can be 
transformed into DTMDP if the transition rate is uniformly bounded by employing uniformization 
method. However, the uniformization method cannot be applied if the transition rate is unbounded. 
Therefore, it is quite necessary to analyze the CTMDP with unbounded transition rates separately. 

In this paper, a new condition is provided to ensure the existence of an average-cost optimal 
stationary policy for denumerable state CTMDP with unbounded cost and transition rates. This 
condition concerns that whether the expected time and expected cost of a first passage from any 
state to a given state is finite or not under a given controlled policy. The former is related to the 
stability of the queueing system, while the latter may be seen as a generalized stability if we notice 
that the expected cost is equal to the expected time if the cost rate is 1. A lot of literature has 
focused on discussing the stability of the queueing system, e.g., mm- Their results can help us 
verify whether an average-cost optimal stationary policy exists or not. 

This paper is organized as follows. In Section 2, we introduce CTMDP and present our main 
result. Section 3 gives the proof of the main result. In Section 4, we give conditions under which 
the average-cost optimality inequality (ACOI) becomes equality. 
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2 Model and Main Result 


Consider a continuous-time Markov decision process {x{t) : t > 0} consisting of four-element tuple 
{S, {A{i),i G S), q{j\i, a),c{i, a)}: 


1. The state space S is denumerable; 

2. Each action space A{i) is a subset of the finite action space A; 

3. The transition rate q{j\i, a) satishes q{j\i, a) > 0, V i / j, i,j & S, a & A{i) and Yljes = 

0, V i € 5, a € A{i). 

4. The cost rate function c{i, a) > 0, V i G S', a G A{i). 

Let n be the set of all randomized Markov policies and F the set of all stationary policies [4]. 
Given vr = (tt^) G II and the discount factor a > 0, we define the expected discounted cost function 
(with initial state i) 

poo 

Ja{i,T^) = / [c(a:(t),7rt)](iL V i G S, vr G n, (1) 

Jo 

and the corresponding optimal discounted cost function Ja{i) = inf^gn tt), V i G S, where 
c{i,TTt) is the expected cost rate at state i using policy vr^ at time t, which is defined as c{i,7rt) = 
fA(i) c(i,a}7rt(da\i}. 

It is straightforward to show that for any stationary policy / G -F, we have 


aJa(i, f) = c(i, f{i)) ^ Ja[j)q[j\i, /(*))• 

j€S 


( 2 ) 


Since A{i) is finite, [1] states that J^(i) is well defined and satisfies the discounted-cost optimality 
equation 

aJ*{i) = min i c(i, a) ^ JaU)QU\h «) } ■ (3) 

aeAi^) 1 


Moreover, we define the long run expected average cost function 


1 r 

Jc{i, tt) = lim sup — / 
T^oo Jo 


Ef[c{x{t),Trt)]dt,y z G 5, TT G n. 


(4) 


and the corresponding optimal average cost function Jc{i) = infirgn <4c(i,7r), V i £ S. 

One of the most important questions is whether an average-cost optimal stationary policy exists 
for the CTMDP. Before we state our main results, we propose the following definition m- 
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Definition 2.1. Let d be a (randomized) stationary policy. Then d is a i^-standard policy if the 
Markov process induced by d, t > 0} satisfies that for any i € S, the expected time id) 

of a first passage from i to zq (during which at least one transition occurs) is finite and the expected 
cost of a first passage from i to zq (during which at least one transition occurs) is finite. 

Remark 1; Note that x{t) = x(t+), a.e.. Thus, if we define the first passage time Ti^i^ as 
= inf{t > 0 : x{t) = zo|a:(0) = z}, then r(z,zo) = 0 a.e. if z = zq. Hence we impose additional 
constraint that at least one transition occurs on the definition of the first passage time. 

Remark 2: If the cost rate function is bounded, then mi^ifid) < oo can implies Ci^ifid) < oo. 
In this case, d is a zo-standard policy if the Markov process induced by d is ergodic (i.e., irreducible 
and positive recurrent). 

The following lemma is extensively used for analysing the stability of a queueing systems, of 
which the proof is omitted for brevity. 

Lemma 2.1. Assume that < oo, V z G S'. Assume that there exists a (finite) nonnegative 
function r on S and a finite subset H* containing zq such that 

Qij\i)rij) < oo, z G H*, (5) 

j 

and 

+ ( 6 ) 

j 

Then there exists a (finite) nonnegative constant F such that Ci^ig < r(z) — r(zo) + V z 7 ^ zq. 

Especially, if H* = {zq}; then Ci^i^ < r(z), V z 7^ zq. 

Let S = {0,1,2,...}. Now we propose our main result. 

Theorem 2.1. Assume that Jq(z) is increasing in i for a > 0. If there exists a 0-standard policy 
d, then there exists a constant g* > 0, a stationary policy f*, and a real-valued function h* (which 
is increasing in i) such that: 

(i) There exists a sequence {an,n > 1} tending to zero (as n ^ 00 ) such that V i G S, 

f*{{) = lim f*^(i),g* = lim (0), (7) 

fc-^-OO fc—>-00 

and 

h*{i) = lim ha^{i), ( 8 ) 

/c—>-oo 

where ha{i) '■= Ja{i) — Ja{0). 
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(ii) {g*,f*,h*) satisfy the following average-cost optimality inequality (ACOI): 


g* > c{i,f*)-^'^h*{j)q{j\i,f* 
j(^s 


(9) 


= min 

a£A{i) 


c(i,a) + '^h*(j)q(j\i,a) ,Vi G S’, 
jes 


and f* is an average-cost optimal stationary policy. 

Remark 1: The above result still holds when the state is a vector rather than a scalar. 

Remark 2: The monotonicity of the discounted value function Ja{i) is often satisfied, e.g., in 
queueing systems more customers staying in the queue implies more waiting. 

Remark 3: The 0-standard policy d is not required to be optimal. It can be any policy which 
is easy to be constructed and analyzed. 

Remark 4: This theorem closely relates the existence of an average-cost optimal stationary 
policy to the stability of the queueing system under a given service policy. The queueing system is 
called to be stable under a given service policy if the induced Markov process is ergodic (irreducible 
and positive recurrent). Positive recurrence implies that for any i £ S, the expected time niifi of a 
first passage from z to 0 is finite. To prove (positive recurrence) the finiteness of the expected time 
rriifi, a Lyapunov function r(-) might be constructed in order to apply Lemma l2.ll with c(z) = 1 
and H* = {0} (noticing that the expected cost is expected time if the cost rate is 1). This method 
can also be found in Theorem 1.18 in [3]. In many situations, with slightly modification of the 
Lyapunov function r(-) constructed for proving mifi < oo, another Lyapunov function can be 
constructed to satisfy Q and thus the finiteness of the expected cost Cifi can be proved. That 
is to say, the discussion of stability of the queueing system can help prove the existence of an 
average-cost optimal stationary policy. 


3 Proof of Theorem 12.1 

[1] proposes the following assumptions to ensure the existence of an average-cost optimal stationary 
policy, which can be seen as a continuous-time counterpart of (SEN) assumptions proposed in [lOj . 

Assumptions A: For some decreasing sequence {an,n > 1} tending to zero (as n —>■ oo) and 
some state zq G S, 

(Al) a„J*^(zo) is bounded in n. 
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(A2) There exists a nonnegative (finite) function H such that ha^{i) < H{i), V z G 5,n > 1, 
where haii) = Ja{i) - 

(A3) There exists a nonnegative constant L such that —L < ha„ii), V i € S,n > 1. 

Before proving Theorem 12.11 we give some results of the Markov process {x{t) : t > 0}. Let 
■m = iE /„• c(x(s))(is|x(0) = i\ i ^ S. We have the following result. 


Proposition 3.1. Let R be a positive recurrent class. 

(i) For i £ R, lim 4 _>.oo Ji{t) exists and equals the (finite or infinite) constant Jr =: '^j^R'^jc{j), 
where iTj is the steady sate probability of being in state j. 

(ii) For i £ R, we have Jr = Ci^ifrui^i. 

(Hi) Jr = = d]’ V t > 0. 


Proof. Let be the expected time of visits to j during a first passage from i to i. Then iTj = 
ei,j(mi^i. Therefore, Jr = ^j^RT^jc{j) = /mi^i = Ci^ijmi^i and thus (ii) holds. 

Note that Jiifi) = c(j)Ll[/g l(x(s) = j)(is|x(0) = i]t~^ and limt_>.oo E[JI( 1(x(s) = j)(is|x(0) = 
i]t~^ = TTj. By Fatou lemma, it follows that lim inft_).oo ./i(t) > Jr. Thus, if Jr = oo, the limit 
exists and equals oo,\/ i £ R. If Jr < oo, then (i) follows from the renewal reward theorem (See [9]). 

Next we prove (iii). Note that £’[c(x(t))|x(0) = j] = where p{j,k,t) = 

P[x{t) = k\x{0) = j]. Since J2j&R'^jPU^k,t) = J2j&s'^jPU, k,t) = ttu (noting that tt* = 0 for 
i £ S — R), we have 


'^TTjE[c{x{t))\x{0) =j] = '^Trj'^p{j,k,t)c{k) 
jeR jeR keS 

= = c{k)7rk = = Jr, 

kes jeR k£S keR 

where the interchange of the order of summation is valid as all terms are nonnegative. □ 


Proposition 3.2. Suppose that d is a io-standard policy with positive recurrent class R. Let JR{d) 
and TTi{d) be defined as in Provosition \3.1\ then 

Jr(J) = a 7ri((i) Jq,(z, J), V a > 0. (10) 

ieR 


Proof. It follows from ([T|) and Proposition 13. ll iiil that 


a'^7ri{d)Ja{i,d) 

i£R 



e-'^^E\{xit),d)\x{0) 


i]dt 
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dt = Jnid) 



'^-Ki{d)E‘^[c{x{t), d)\x{0) =i] 
JeR 


where the interchange of the summation and integration is valid as all terms are nonnegative. □ 


Proposition 3.3. Assume that Jaiio) < oo, for some a > 0. Given i / assume that there 
exists a policy 9i such that both the expected time and expected cost of a first passage from i to io 
are finite. Then ha{i) < and hence (AS) holds for io with H{i) = 

Proof. If the process begins in state i ^ io and follows policy 9i, it will reach state io at some time 
in the future, which is denoted by T. Let the policy if follow 9i until io is reached, then follow an 
a discounted optimal policy fa. 

Then we have 




< 

< 


Ja{iA) 




E^ 


[/ 

/ 

L ./0 


e “*c(x(i), a(i))(ii|x( 0 ) 
c{x{t), a{t))dt\x{0) = i 


+ J*(io)- 



+ E^ [e-“'^|x(0) =i] J^iio) 


+ Jai'i'O) 


The result follows by subtracting Ja{io) from both sides. 


( 11 ) 

□ 


Remark: Proposition 13.31 gives a way to construct a function H{i). From the remark below 
Proposition 14.11 it is known that Ci^iQ{9i) is a quite good choice for H{i). 

Proof of Theorem I2.lt We only need to prove that (Al-3) hold under conditions in The¬ 
orem [2T] Let io = 0. It follow from (fTOjl that JR{d) > a7ro{d)Ja{0,d) > a7ro{d)J*{0). Hence 
^ JR{dl)/T^o{d) = co,o(rf)- Therefore, (Al) holds. From Proposition 13.31 we know that (A2) 
holds with H{i) = Cifi{d) for i 7 ^ 0 and R(0) = 0. Since J^(i) is increasing in i, it follows that 
ha{i) > 0, and hence (A3) holds with L = 0. It follows from ([ 8 ]) and the fact that ha{i) is increasing 
in i that h*{i) is increasing in i. □ 


4 Sufficient Conditions for ACOE to Hold 

Proposition 5.11, [1] has given an example to demonstrate that (SEN-C) is not sufficient to claim 
that the average-cost optimality equality (ACOE) holds, i.e., ACOI might be strict. [3] gives one 
condition under which the ACOE holds, i.e., the inequality in ([9]) is in fact equality. However, in 
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many situations it is hard to verify this condition and even in some cases it fails to hold due to 
improper choice of the function H{i). 

In this section, we give conditions under which the ACOE holds. We first develop some nota¬ 
tions. Let G) be the class of policies 9 satisfying 

Pe{x{t) G G for some t > 0, at least one transition occurs|x(0) = i) = 1, 

and the expected time mi^Gi&) of ^ ff^st passage from i to G (during which at least one transition 
occurs) is finite. Let G) be the class of policies 6 G D\{i, G) such that the expected cost Ci,G{0) 
of a first passage from i to G (during which at least one transition occurs) is finite. If G = {x}, 
then lH(f,G) is denoted by 9I(i,x) (respectively, 9I*(i,G) by 9I*(f,x)). 

Proposition 4.1. Assume that the Assumptions (Al-3) hold, and for some state i and nonempty 
set G, there exists a policy 9 G 91(i,G) such that H{j)Pg{x{T) = j) < oo, where T is the 

first passage time from i to G and H is the function from (A 2). Then for any limit function h*, 
we have 

h*{i) < Ci,G{e) - 9*mi,G{e) + Ee[h*{x{T))\x{0) = i]. (12) 


Proof. In a derivation very similar to that in (lllh . we have 


raii) < Ci,G{e) + E,[e-“^J*(x(r))|x(0) = z]. 


which can be written as 

ha{i) < Ci,G{S) - aJ*iio) 
Note that 


1 — "^|x(0) = i] 


a 


+ Ee[e-^^hMT))\xiO) =i]. (13) 


1 — Eg[e |x(0) = i] 


a 


= Eg 


f 

Jo 


e “*(is|x(0) = i 


(14) 


The term e '^^ds is decreasing in a. It follows from monotone convergence theorem that the 
limit of the left side of (fT^ exists and equals to Eq[T\x{0) = z] = m* ,g{0)- 

Choose a discount factor sequence {an,n > 1} tending to zero such that ([^ and ([8]) hold. 
Taking the limit of both sides of (fTSll as —>■ O"'' yields 






Note that e (x(r)) converges to h*{x{T)) as n —)• oo. Since e is bounded 

by max(L,//(x(T))) from (A2) and (A3), and £' 0 [max(L, ff(x(T)))] < L + E 0 H{x{T)) = L + 
H{j)Pe{x{T) = j) < oo, by dominated convergence theorem it is known that 

lim E 0 [e-“"^/i„„(x(r))|x(O) =i]= Ee[h*{xiT))\x{0) = i]. 

n—)-oo 

Therefore, (fT^ holds. □ 

Now we give sufficient conditions under which the ACOE holds. 

Theorem 4.1. Assume that the Assumptions (Al-3) hold, and let e be a stationary policy realizing 
the minimum in the ACOI. Define the nonnegative discrepancy function $ to satisfy 

g* = c(i,e) + 4'(i) + ^g(j|i,e)/i*(j),i € S. (15) 

jes 

Then <h(z) = 0, and hence the ACOE holds at the particular state i under any of the following 
conditions: 

(i) There exists a nonempty set G such that e satisfies e G iR{i,G) and Ylj£G^ij)^s{x{T) = 
j) < oo, where T is the first passage time from i to G. 

(a) e ^ lH(i,io). 

(in) The Markov process induced by e is positive recurrent at i. 

(iv) l9(j|b ® S ^(0- This conditions typically hold when the jump size 

at each state i is bounded and thus there are finite number of j such that q{j\i,a) > 0 for each 
i G S. 


Proof. To prove equality under (i), let the process operate under e, and suppress the initial state 
i. Since the first passage time from i to G, T, is a stopping time such that Ee[T] = < oo 

as e G 91(i,G), it follows from Dynkin’s formula (see [ 8 ]) that 

E^[h*{x{T))] = h*{i)+E^ 

From (jlSp it is known that 

Ee[h*ix{T))] = h*{i) + E^ 


f 

Jo 


{g* — c{x{s), e) — ^{x{s))ds 



and thus 


Ci,Gie) - g*mi^G{e) + E® 


^{x{s))ds 


L ./0 


+ E^[h*ix{T))]=h*ii), 


( 16 ) 
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which implies that < oo, and hence e G D\*{i,G). Therefore, we can apply Proposition 14.11 

which yields 

Ci,G{e) - g*mi,G{e) + E'^[h*{x{T))] > h*{i). 

Comparing the above equation with (jl6p and keeping in mind that is nonnegative, we know that 
$ = 0 during the first passage from i to G. Specially, we have <h(i) = 0 and thus the ACOE holds 
at state i. 

(ii) follows from (i) by choosing G = {io} and the fact h*{iQ) = 0. 

(iii) follows from (i) by noting that if the Markov process induced by e is positive recurrent at 
i, then e G D\{i, i). 

(iv) follows from the same argument in Theorem 5.9 in [4]. □ 

Remark: If starting from an arbitrary initial state i, in a finite expected amount of time the 
Markov process induced by e reaches a finite set G, then the ACOE holds. 

5 A Queueing Example 

Example 1. A single-server, 2-buffer queueing model. Consider a server serving two types of 
customers: type 1 and type 2 customers. Type 1 and 2 customers form queue 1 and queue 2, 
respectively. Type 1 and 2 customers arrive according to two independent Poisson processes with 
parameter Ai and A 2 , respectively. Buffers of both queues are assumed to be infinitely large. The 
service times of type 1 and 2 customers are exponentially distributed with parameters fii and ^ 2 , 
respectively. While waiting in queue, a type 1 customer may change to a type 2 customer after 
a random time T, which is exponentially distributed with parameter Xt- The holding cost of a 
customer in queue 1 and 2 per unit time is hi and /i 2 , respectively. When a type 1 customer 
upgrades, the cost of transferring from queue 1 to queue 2 is c per unit. The server should decide 
which buffer to serve to minimize the average cost. 

The state can be denoted by q = (gi, ^ 2 ); where qi is the length of queue i, i = 1, 2. Eor each 
state q, we have the corresponding action set 

{0}, ifq=(0 ,0), 

{!}, if q = (9i,0),gi > 0, 

{2}, if q = ( 0 ,g 2 ),g 2 > 0, 

{1,2}, otherwise. 


^(q) = 
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And the corresponding transition rate is 


9(q'|q, 1 ) 


hi, 

if q' = {qi - 1,^2), 

^1, 

if q' = {qi + 1,^2), 

^ 2 , 

if q' = {qi,q2 + 1), 

QiXt, 

if q' = {qi - 1,^2 + 1 ) 

— {hi + Ai + A2 + qiXx), 

if q' = {qi,q2), 

0 , 

otherwise; 


and 


^(q'|q,2) 


h 2 , 

if q' = (^1,^2 - 1 ), 

^1, 

if q' = {qi + 1,^2), 

^2, 

if q' = (^1,^2 + 1 ), 

qiXr, 

if q' = (gi - 1,^2 + 1 ) 

— {h 2 + Ai + A2 + (71 At), 

II 

to 

0, 

otherwise; 


9(q'|q,2) 


h 2 , 

if q' = {qi,q2 - 1), 

^1, 

if q' = {qi + 1 ,^ 2 ), 

^2, 

if q' = (91,92 + 1), for 91 

— {h2 + Ai + A 2 ), 

if q' = {qi,q2), 

0, 

otherwise; 


Moreover, ( 7 (q^|q, 0) = 0. 

Let x{t) = {xi{t),X 2 {t)) be state at time t, and be the total number of transferred 

customers till time t under policy vr, where Xi{t) is the length of queue i at time t, i = 1,2. The 
expected discounted cost function under policy tt in this example can be formulated as 


Jo(q,7r) 


e: 



e + {hixi{t) + h 2 X 2 {t))dt) 


e + h2X2{t) + c\TXi{t))dt. 

Hence, the cost rate function is c(q, 1) = c(q, 2) = c(q) = hiqi + h 2 q 2 + cqiXx- 
We have the following result for Example 1. 



Proposition 5.1. Suppose that Ai + A 2 < min(^i,^ 2 )- There exists an average-cost optimal 
stationary policy for Example 1 and the ACOE holds. 
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Proof. We apply Theorem 12.11 by proving that 

(i) Jo(q) is increasing in q; 

(ii) The priority service (PS) policy is a 0 = (0,0)-standard policy. The PS policy specifies that 
the server will always choose a customer in (nonempty) queue 2 to serve at each decision epoch. If 
queue 2 is empty, the serve will serve customers in queue 1, if there is any. If the server is serving 
a type 1 customer when a type 2 customer arrives, the type 1 customer is pushed back to queue 1 
and the server begins to serve the type 2 customers. The interrupted type 1 customer will resume 
or repeat its service if the server is available to serve type 1 customers. If the system is empty, the 
server will be idle. 

To prove (i), denote the optimal stationary policy by vr*. At state (( 71 ,( 72 ), we add a virtual 
customer of type 1 at queue 1. He has the same transfer rate as the ordinary customer of type 
1. However, he has no holding cost and transferring cost. For this queueing system G{qi,q 2 ] 1,0), 
policy vr* can still be used and by comparing each realized trajectory we know that the resulting 
expected discounted cost C{G{qi, q2]l,0)) is less than J*{qi + 1 , 92 )- On the other hand, the 
queueing system G{qi,q 2 ] 1,0) is in fact a queueing system with state {qi,q 2 ) and since policy vr* 
for system G{qi, 52 ; 1,0) might not be an optimal policy for queueing system with state (( 71 , ( 72 ) we 
have that G{G{qi,q2-,l,0)) > J*{qi,q 2 )- Therefore, J*(gi + 1 ,^ 2 ) > and thus J*{qi,q 2 ) 

is increasing in qi. Similarly, Ja{qi,q 2 ) is increasing in q 2 . Thus, ^^(q) is increasing in q. 

Let e = 712 — Ai — A 2 > 0 and d be the PS policy. From m it is known that the Markov process 
induced by d is ergodic, and thus mq^o((^) < c>o, V q G S'. Next we prove that Cq^o((^) < 00 , V q G 5. 

Inspired by m, we choose the Lyapunov function r(q) = and then apply Lemma l2.II 

with H* = {0}. Here the constants K, ri, r 2 are left to be specified later. ([6]) requires that 


c(q) + 


r2 


7i2-1 + Ai(ri - 1) + A2(r2 - 1) + qiXr -1 


r2 


qi > 0,^2 > 1 , 


< 0 , 


(17) 


and 


c(gi,0) + Krf 





+ Ai(ri - 1) + A 2 (r 2 - 1) + qi^T 



qi > 1 - 



(18) 


Choose r 2 = ri and n > 1 such that n > . Denote 5 = _ ^ 2 ^ -(ri —1). 

By the choice of ri it is known that <5 > 0. Choose K such that K{ri — 1) > max(/ii + cAt, h 2 ). 
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Therefore, we have 


- 1 ) + Ai(ri - 1) + A2(r2 - 1) + qiXxi— - 1) 

' 7-2 / ri 


c(q) + Kr’^r’^ 

< c(q) - +''"5 

< hiqi + h2q2 + cqiXr - K{ri - 1)(5(q'i + 92 ) 


< 0 , 


and thus (fT7|) holds. Similarly, (fTSj) also holds. Therefore, it follows from Lemma ITT] that Cq_o(d) < 
0 for q 7 ^ 0. Besides, it is easily seen that 


co,o — o),o(^^) + \ I \ '^(0,1),o('^) < 00. 


A1 + A2 ^ A1 + A2 

Therefore, the PS policy d is a 0-standard policy, and thus (ii) is proved. 

It follows from Theorem 12.II that an average-cost optimal stationary policy exists and the ACOI 
is satisfied. Besides, condition (iv) in Theorem Id.lf ivl is satisfied as there are only finite j such 
that q{j\i-,a) 7 ^ 0 for any i £ S,a £ A{i). Therefore, the ACOE holds for any i £ S. □ 


Remark: From the proof we know that the result still holds if the cost rate function is increasing 
and polynomial in q. 
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